CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Practice - Oriented )
نویسندگان
چکیده
We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a matching algorithm, such as a machinelearning technique or a hand-written decision tree, to compute a match decision. ClueMaker is based on Java and is compiled to Java source code. Therefore, ClueMaker is easily accessible to many programmers, allows the integration of any Java library, runs on virtually any platform, supports Unicode, and is more easily accepted by IT departments who try to minimize the number of distinct languages in use. ChoiceMaker Technologies has used ClueMaker successfully over the past two years in a variety of approximate record matching tasks.
منابع مشابه
CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Complete Paper )
We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learni...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملThe ChoiceMaker 2 Record Matching System
This paper describes the key features of an innovative record matching system called ChoiceMaker 2 developed by ChoiceMaker Technologies (CMT). We begin with an overview of the stages that a record matching system goes through to find an incoming “query record” in a database. We then consider the stages one by one: We sketch out our patent-pending process for identifying possible matches to the...
متن کاملReal World Performance of Approximate String Comparators for use in Patient Matching
Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate...
متن کاملThe Effects of Task Orientation and Involvement Load on Learning Collocations
This study examined the effects of input-oriented and output-oriented tasks with different involvement load indices on Iranian EFL learners' comprehension and production of lexical collocations. To achieve this purpose, a sample of 180 intermediate-level EFL learners (both male and female) participated in the study. The participants were in six experimental groups. Each of the groups was random...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003